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Abstract 

The problem of nonparametric estimation of the conditional density of a response, given 
a vector of explanatory variables, is classical and of prominent importance in many predic¬ 
tion problems since the conditional density provides a more comprehensive description of the 
association between the response and the predictor than, for instance, does the regression func¬ 
tion. The problem has applications across different fields like economy, actuarial sciences and 
medicine. We investigate empirical Bayes estimation of conditional densities establishing that an 
automatic data-driven selection of the prior hyper-parameters in infinite mixtures of Gaussian 
kernels, with predictor-dependent mixing weights, can lead to estimators whose performance 
is on par with that of frequentist estimators in being minimax-optimal (up to logarithmic fac¬ 
tors) rate adaptive over classes of locally Holder smooth conditional densities and in performing 
an adaptive dimension reduction if the response is independent of (some of) the explanatory 
variables which, containing no information about the response, are irrelevant to the purpose of 
estimating its conditional density. 


1 Introduction 


The problem of estimating the conditional density of a response, given a set of predictors, is classical 
and of primary importance in real data analysis, since the conditional density provides a more 
comprehensive description of the association between the response and the predictors than, for 
instance, does the conditional expectation or regression function which can only capture partial 
aspects of it. The conditional density contains information on how the different features of the 
response distribution, like skewness, shape and so on, change with the covariates. Conditional 
density estimation for predictive purposes have applications across different fields like economy, 
actuarial sciences and medicine. 

Nonparametric estimation of a collection of conditional densities over a covariate space presents 
two main features: (a) the multivariate curve may have different regularity levels along different 
directions, (b) the function may depend only on a subset of the covariates. The goal is estimating 
a multivariate function of the relevant predictors, while discarding the remaining ones, and obtain 
procedures that simultaneously adopt to the unknown dimension of the predictor and to the possibly 
anisotropic regularity of the function. Classi cal references on nonpar amet ric conditional de nsity 


Bertin et al 


estimation taking a frequentis t approach ar e Efromovichl (|2007L 120101) and 
also the recent contribution by 


Hall et al 


( 20041) : see 


(120151) . The problem of conditional density estimation 
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has been studied taking a Bayesian nonparametric approach only recently and popular methods 
are based on generalized stick-breaking process mixture models for which supporting results, in 
terms of freq uentist asym pto tic pr operties of posterior distributions, have been given by 


Patietal 


(120131 ) and iNorets and Patil ( 2014 ). The former article provides sufficient conditions for posterior 
consistency in conditional density estimation for a broad class of predictor-dependent mixtures of 
Gaussian kernels. The latter presents results on posterior contraction rates for conditional density 
estimation over classes of locally (isotropic) Holder smooth densities using finite mixtures of Gaussian 
kernels, with covariate-dependent mixing weights having a special structure. The entailed density 
estimation procedure converges at a rate that automatically adapts to the unknown dimension of the 
set of relevant covariates, thus ultimately performing a dimension reduction, and to the regularity 
level of the sampling conditional density. 

In this note, the focus is on defining procedures for conditional density estimation that attain 
minimax rates (up to log-factors) of posterior concentration adopting to both the dimension of the 
set of relevant covariates and to the regularity level of the function. We consider a procedure based 
on infinite mixtures o f Gaussian kernels, with the same predictor-dependent mixing weights as in 


Norets and Pati ( 201 4), and show that it can have a performance on par with that of the procedure 


proposed by the above cited authors in terms of rate adaptation to the predictor dimension and to 
the (isotropic) regularity level. Under the same set of assumptions on the data generating process 
and the prior law, the performance of the conditional density estimation procedure of an empirical 
Bayesian, who considers an automatic data-driven selection of the prior hyper-parameters, matches 
with that of an “honest” Bayesian. We deal in detail with the isotropic case; extension of the result 
to the anisotropic case follows along the same lines. 

The organization of the article is as follows. Section fTTTl sets up the notation. Section [2] presents 
the main results on adaptive empirical Bayes posterior concentration at minimax-optimal id-rates 
(up to log-factors) for locally Holder smooth conditional densities, with contextual adaptive dimen¬ 
sion reduction in the presence of irrelevant covariates. Final remarks and comments are gathered 
in Section [3] The statement of a theorem invoked in the proof of the main result is reported in the 
Appendix for easy reference. 


1.1 Notation 

Let No = {0, 1, ... } be the set of non-negative integers and R + that of positive real numbers. For 
any a, b £ M, we denote by aAb their minimum and by aV & their maximum. We write and 
for inequalities valid up to a constant multiple which is universal or inessential for our purposes. For 
a generic sequence {a n }, we use the notation a n = o(l) (n — > oo) to mean that a n — > 0 as n — > oo. 
For sequences {a n } and { b n }, by writing a n = 0(b n ) (n —> oo) we mean that b n ^ 0 and there exists 
a constant K > 0 so that \a n /b n \ < K for every n £ N. 

For d x £ N, let X C R dx be the covariate space; for d y £ N, let y C R d s< be the response space 
and, for d := d x + d yi let Z = X x y C R d be the sample space. 

For any k £ N, if E C R fc and x £ R fe , the translate of E is the set E + x := {z + x : z £ E}. If 
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£, •& G R fc , the Euclidean distance between £ and d is ||£ — i?|| := — ^i) 2 } 1 ^ 2 - 


Let 


P:=\f:Z^[ 0, oo) 


Borel-measurable and, Va; G A, / /(y|x)dy = 1 


be the space of conditional probability densities with respect to Lebesgue measure m on y. The 
same symbol m will also be used to denote Lebesgue measure on Z. A centered multivariate normal 
density with covariance matrix a 2 1, for I the identity matrix whose dimension is clear from the 
context, is denoted by <f> a . The symbol 8 Z stands for point mass at z. 


Let Q be a fixed probability measure on the measurable space (A, B{X)), with B(X) the Borel 
(j-field on X, that possesses Lebesgue density q. 


Given any real number p > 1 and Borel-measurable function g : Z — > R, for every x G 
X , we introduce the notation ||^|| P>:E := (Jy\g(x, y)\ p dy) l ' v that is useful to define global dis¬ 
tances between conditional densities. For any pair of (conditional) densities fi, f 2 G X, let the 
^-integrated ^-distance be defined as ||/ 2 — /i||i := f x H /2 — fi\\i, x q{x)dx and, analogously, the q- 
integrated Hellinger distance as h(f 2 , / 1 ) := J x \\f^ 2 ~ /i^ 2 || 2 X < l( x ) < ^ x - -^ or (conditional) densities 
/, /o G J-, the q-integrated Kullback-Leibler divergence of / from /o is defined as KL(/o; /) := 
fxxy /o<7 l°g(/ot7/y*?) dm, m being here the Lebesgue measure on Z 1 which coincides with the 
Kullback-Leibler divergence of fq from foq. Analogously, the g-integrated second moment of log(/og//g) 
is defined as V 2 (/o; /) := f Xx y foq\ l 0 g(/o9//9)| 2 dm and coincides with the second moment of 
log(/og//g) With respect to f 0 q. 

The e-covering number of a semi-metric space (M, d), denoted by N(e, M, d), is the minimal 
number of d-balls of radius e needed to cover the set M. 


2 Main Results 


Let = (Z 1 , Z n ) be a random sample of independent and identically distributed (i.i.d.) 

observations Zi = (Xi, Yi) G Z , i = 1, ..., n, from a probability measure Pq on the measurable 
space (Z, B(Z)), where B(Z) is the Borel cr-field on Z, that possesses Lebesgue density foq that 
is referred to as the true joint data generating density, with f 0 G P the conditional density of the 
response Y, given the predictor X, and q the marginal density of X, called the design density, 
which is fixed and, for theoretical investigation, does not need to be known or estimated. The 
problem is to estimate the conditional density /o when no parametric assumption is formulated 
on it, taking an empirical Bayes approach that employs an automatic data-driven selection of the 
prior h yper -para meters. For a recent overview of empirical Bayes methods, the reader may refer 


Petrone et al 


( 2014al ). Even if the proposed empirical Bayes procedure simultaneously leads to 


to 

adaptation with respect to both aspects (a) and (b) illustrated in the Introduction, the two issues are 
treated separately for ease of exposition: we first deal with adaptive estimation over classes of locally 
Holder smooth conditional densities when the dimension of the predictor is correctly specified and 
then prove adaptive dimension reduction in the case where fewer covariates are relevant. Adaptive 
dimension reduction clearly plays a key role in view of the curse of dimensionality. In Section f2.2l it 
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is shown that, when the response is independent of (some of) the covariates introduced in the model, 
the empirical Bayes posterior asymptotically performs a dimension reduction, thus contracting at 
a rate that results from the combination of the dimension of the subset of relevant explanatory 
variables and the possibly anisotropic regularity level of the curve as a function of the selected 
covariates. 


2.1 Empirical Bayes posterior concentration for conditional density esti¬ 
mation 


In this section, we consider empirical Bayes posterior contraction rates for estimating conditional 
densities when the dimension of the predictor is correctly specified. 


Prior law specification. A prior distribution can be induced on the space J- of conditional densities 
by a law 11^ on a collection of mixing probability measures Ai x = {P x £ A4(0), x £ X}, where 
A4(0) denotes the space of all probability measures on some subset 0 C ]1, using a mixture of 
d y -dimensional Gaussian kernels to model the conditional density 

f(-\x) = (F x * $<?)(■) = f - 0)dF x (6), x £ A, 

Je 

where, for every x £ X, F x is the cumulative distribution function corresponding to a probability 
measure P x which is assumed to be (almost surely) discrete 

OO 

Px = 'y ^ fx); 
i=i 


with random weights pj (a;) > 0, j £ N, such that YIJLi Pj (x) = 1 almost surely, and random support 
poin t s IflAx)! that are i.i.d. replicates drawn from a probability measure G x on 0. Following 
( 20131) . we single out two relevant special cases. 


Pati et al 


Predictor-dependent mixtures of Gaussian linear regressions (MGLR^): the conditional density 
is modeled as a mixture of Gaussian linear regressions 


/(• \x)= f K(--/3'x)d F x {p), 

J 


x £ X, 


where /3'x denotes the usual inner product in 18L dx and the mixing measure P x corresponding 
to F x is such that P x = Pj(x)Sg j almost surely, with the vectors of regression coeffi¬ 

cients Pj ~ G. For a particular structure of the random weights Pj(x)’ s, probit stick-breaking 
mixtures of Gaussian kernels are obtained. Probit transformatio n of Gaussian proce s ses fo r 
constructing the stick-breaking weights has been considered in Rodriguez and Dunson ( 20111) . 
who exhibit applications to real data of the probit stick-breaking process model. 


• Gaussian mixtures of fixed-p dependent processes: if Pj(x) = pj for all a; £ X , we obtain 
mixtures of Gaussian kernels with fixed weights. Versions of fixed-p dependent Dirichlet process 
mixtures of Gaussian densities (fixed p-DDP) have been applied to ANOVA, survival analysis 
and spatial modeling. 
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We consider a variant of the prior proposed in INorets and Patil (|2014 ). 
measure on X and G a probability measure on y. For (A, t) £ y x R + , 
let G t (■ — A) denote the probability measure on y with Lebesgue density 
Given (pF p\ • ) £ Z, j £ N, and a £ R + , for every x € X, let 


Let v be a probability 
with abuse of notation, 
r _ 1 (dG/dm)((- — A)/r). 


ib>(z) := 


Pj^ajx - p?) 

YZLlPqMx - VqV 


j £ N. 


We propose the following prior specification: 


( 2 . 1 ) 


OO 

Yi\(Xi = Xi), (F x ) xeX , <7 ~ (F Xi * (/)„)(■) = ^2pj,a(Xi)(t>a(- - p V j), 

oo J=1 

^ PjSfjjx ^v) ~ DP(coi / x G r (- — A)) independent of cr ~ IG(a, (3), 
1=1 


where Co £ R+ is a finite constant and a, /3 £ R+ are the shape and scale parameters of an inverse- 
gamma prior distribution, respectively. In this case, F x corresponds to the probability measure 
P x = Y2^L 1 Pj,a(x)S fi y. For later use, note that, defined the mapping g : x H > P q 0a(% — Pq) 

and modeled the conditional density / as Y2jLi — pj), the density product fg is a mixture 

of d-dimensional Gaussian densities 

OO 

f(y\x)g(x) = Y.P^ X - p*)(l>a(y - p)). ( 2 . 2 ) 

j =i 


By the stick-breaking representation of a Dirichlet process (DP), the random weights pj = Vj Ili=i(l — 
14), j £ N, with Vj ~ Beta(l, co), and the locations /p- ~ G T (- — A). The last assertion is equivalent 
to /Jj = A -)- (j. with Q ~ r" 1 (dG/dm)(-/r), j £ N. The overall prior can be rewritten as 


Yi\(Xi — Xi ), ~ y ' Pj , 17 (' ^ Cj ) 

f=i 

OO 

y>A,*,C«) ~ DP(co^ x G r ) independent of cr ~ IG(a, /?). 
1=1 


(2.3) 


For the vector 7 = (/3, A, r 2 ) of prior hyper-parameters, let II 7 stand for the product prior law 
DP(co^ x G t (- — A)) x IG(a, (3). Let II 7 (.B|Z( n ') denote the posterior probability of any Borel set 
B of (J 7 , d), where d can be either the g-integrated Hellinger or id-distance. For any estimator 
7 n = (/3 n , A n , t 2 ) of 7 based on Z^ n \ the empirical Bayes posterior law II 7 ri (-| Z^) is obtained by 
plugging 7 n into the posterior distribution 


n 7 „(-|z (n) ) = n 7 (-|z (n) )| 7=7 , 


We study empirical Bayes posterior concentration rates relative to d at an ordinary smooth condi¬ 
tional density /o, namely, we assess the order of magnitude of the radius Me n of a shrinking ball 
centered at f 0 so that 


p o" n 7 n (/ G J 7 ■ d(f, f 0 ) > Me n \Z W) -»■ 0, 


(2.4) 
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where P(fq> is used to abbreviate expectation J zn tpdP(f under the n-fold product measure P(f. We 
consider the case where the true conditional density /o, regarded as a mapping from Z to R_|_ U {0}, 
satisfies a Holder condition in the sense of the following definition, for which we introduce some more 
notation. For any (3 £ R+, let (/?) := max{i £ No : i < /?} be the largest non-negative integer strictly 
smaller than (3. For a d-dimensional multi-index k = (k\. ..., kd ) £ Nq, define k. = k\ + ... + kd 
and let D k denote the mixed partial derivative operator d k '/dz\ y ... dz^ d . 

Definition 2.1. For any (3 £ R+, r > 0 and function L : Z —>■ R+ U {0}, let the class C^' L ' T (Z) 
consist of functions f : Z R that have finite mixed partial derivatives D k f of all orders k. < (/?} 
and, for every k £ Nq such that k. = (/?}, the mixed partial derivatives of order k are locally 
(uniformly) Holder continuous with exponent [3 — (/?} in Z with envelope L, 


\(D k f)(z + A) - (D k f)(z)\ < L{ Z y 


IIAf 


l|A|| 


/ 3 -</ 3 > 


V z, A ez. 


(2.5) 


This function class has been previously considered by 


Shenetal 


(120131) . who constructively 


showed that Lebesgue probability density functions in C' 9,i ’ T (R d ) satisfying additional regularity 
conditions can be approximated by convolutions with the Gaussian kernel </> CT with an L l -e rror of 
the order a 13 . The construction of the mixing densit y in the approximation can be viewed as a 


multivariate extension of the results in iKruiier et ah J 2010 , § 3) , the main difference being that 
condition (12.51) is weaker than the one employed in IKruiier et alJ (12010h . where it is assumed that 
log/o e CP' L ’°(R). 

If e n is (an upper bound on) the posterior contraction rate and the convergence in (12.41) is 
at least as fast as then e n is (an upper bound on) the rate of convergence of the estimator 
fn{-\x) = jj, /(-|a;)n^ n (d/|Z( n )). Since the convergence rate of an estimator cannot be faster than 
the minimax rate over the considered density function class, the posterior contraction rate cannot 
be faster than the minimax rate. So, if the posterior distribution achieves the minimax rate, then 
also {/„(• \x)} X £x has minimax-optimal convergence rate and is adaptive. 

In order to state the main result on empirical Bayes posterior contraction rates at locally Holder 
smooth densities, we report below the assumptions on the “true” joint data generating density foq 
and the prior law n 7 . 


2.1.1 Assumptions on the joint data generating density and on the prior law 

Assumptions on f 0 q 

(i) X = [0, l] d *; 

(ii) q is bounded; 

(Hi) f 0 £ C 0 ' l ’ t (Z). For some p £ K+, / z (|L|// 0 ) 2+ ’ 7 / /3 /odw < oo and 

J z (\D k f 0 \/fo)W +ri)/k fodm < oo for all k. < </3); 
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(iv) there exist constants Bo, t £ K + such that, for every x £ X, 


fo(y\x) < exp (-BoIIi/ID for lar s e IMI- 


Assumption on II 7 

(v) the base probability measure v x G of the Dirichlet process possesses Lebesgue density and 
there exist constants p, Co £ K+ so that 


d G 

dm 


(y) oc exp (-CollyD for large ||y||. 


Assumption (i) is not restrictive since we can always reduce to it by rescaling observations on 
the covariates to live in the unit interval. Assumption (ii) is verified as soon as the design density 
is continuous on the closed unit interval, see the comments following the statement of Theorem 12.11 
concerning its use in the proof. Assumption (Hi) requires Holder type regularity of fo in addition 
to integrability conditions, which jointly with assumption (iv), are used to approximate folx with 
a finite d-dimensional Gaussian mixture having a sufficient ly rest ricted number of support points, 

(1201.4 


Shen et al 


see Theorem 3, Proposition 1 and Theorem 4 of 
We now state the main result. 

Theorem 2.1. Suppose there exists a set K n C R+ xlx R + such that Po(ln € Kff) = o(l). Under 
assumptions (i)-(v), the empirical Bayes posterior distribution corresponding to the prior in 
contracts at a rate e n = n~ /( 2 ^+ d )(logn) 4 for a suitable constant t > 0. 

We give a few comments on Theorem [O] before presenting its proof. The empirical Bayes poste¬ 
rior distribution corresponding to the prior described in (12.31) contracts at a rate n~^^ 2 ^ +d \logn) t 
which differs from the minimax i 1 -rate attached to the class of locally Holder densities C /3 ' L,T (Z) 
for at most a logarithmic factor. The quality of the estimation improves with increasing regularity 
level P and deteriorates with increasing dimension d. Furthermore, the rate automatically adapts 
to the unk nown regularity level /3 of the “true” conditional density fo, whatever /3 £ R + , see, e.g., 
Scricciolol (12015 1 for an overview of the main schemes for Bayesian adaptation. This implies exis¬ 
tence of empirical Bayes procedures for conditional density estimation that attain minimax-optimal 
rates, up to logarithmic terms, over the full scale of locally Holder densities and perform as well 
as adaptive Bayesian procedures like the o ne entailed by th e hierarchical prior of finite Dirichlet 
mixtures of Gaussian densities proposed bv INorets and Pati (1201 11 ). 

The problem presents two main difficulties: 

(а) data-dependence of the prior law due to an automatic data-driven selection of the prior hyper- 

parameters; 

(б) dependence of fo on the covariates, which gives account for dependence of the convergence rate 

on the dimension d of the sample space Z. 
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Conc erning (a), da t a-depen denc e of the prior can b e dealt with resorting to the same key idea 


as m 


i n \ _ Li _| 

Petrone et al. 

(2014bj and 

Donnet et al. ( 

2014) 


(i2014h . which is based on a prior measure change 


aimed at transferring data-dependence from the prior law to the likelihood, as long as a parameter 
transformation can be identified. 

Concerning (fo), dependence of / 0 on the covariates can be dealt with regarding f 0 as a d-multivariate 
joint density with respect to Lebesgue measure on [0, l] dx x y. Indeed, / 0 is a joint density, but 
with respect to the measure Q x m on Z , w h ich pr e vents immediate use of Gaussian mixtures for 
its approximation. A device due to Norets and Patil (2014) based on the inequality 


Hffo) < ll(/5) 1/2 - (/ol[o,i]^) 1/2 || 2 , 

which relates the g-integrated Hellinger distance between the conditional densities / and fo to the 
Hellinger distance between the joint densities fg and /ol[o.i] d »; where f(y\x)g{x) = Pj<pcr(x — 

H'j)<p cr {y — Hj) by virtue of equality (12.21) . takes advantage of the special structure of the mixing 
weights pj, a ( x ) in model (12.11) for the conditional density / to approximate the joint Lebesgue density 
/ 0 l[o. \\dx by mixtures of d-dimensional Gaussian densities. Thus, the problem of approximating the 
“true” joint data generating density f 0 q with fq is translated into the problem of approximating 
/ol[o.i] d a! with mixtures of d-dimensional Gaussian densities. 

Pro of. We appeal to Theorem [3TT] reported in the Appendix which is an adapted version of Theorem 
1 in Donnet et al. ( 2014 ). 

We first define the parameter transformation for the change of prior law. For sequences b n j, 0, 
b n t oo, L n 4- ~In t oo, t n j. 0 and in f oo, consider a set K n = [fo„, b n ) x [l n , l n ) x [t£, t%j) C 
K + xlx R + such that Pff (y n G Kn) = o(l)- F° r a sequence u n j. 0 to be suitably defined later on, 
consider a u ra -covering of K n by Euclidean open balls of radius u n . To the aim, let v n , w n , z n be 
positive infinitesimal sequences to be chosen as later on prescribed. Consider 


- a covering of \b n , b n ) with intervals B r = [b r , b r+ 1 ), where b r := b n (l+z n ) r 1 for r = 1, 

z n )] ) 


riog(6 n /6„)/log(l+ 


- a u n -covering of [l n: l n ) with intervals Lk = [h, lk+i), where Ik := l n + (k — l)u ra for k = 

!)•••) T (In ~ in )/+ 1] , 

- a covering of tfj\ with intervals T s = [t 2 s , t 2 +1 ), where t 2 := t^(l+ui„) s_1 for s = 1, ..., [2 log(t n /t n )/ log(l+ 

w n ) 1 - 

For any b G B r , let n r := b/b r . We have 1 < n r < 1 + z n - For any t 2 G T s , let p s := (t 2 /t 2 ) 1 ^ 2 . We 
have 1 < p s < (1 + w n ) 1 ^ 2 . Fix 7 ' = (b r , lk, t 2 ). For any 7 = (fo, l, t 2 ) G B r x Lk x T s , the Euclidean 
distance H7-7II = [(b-b r ) 2 + (l-l k ) 2 + (t 2 ~t 2 s ) 2 ] 1/2 < [{1 +z n ) 2 z 2 J) 2 n +v 2 n + {l+w n ) 2 w 2 Ji\ l/2 =: u n . 

In order to have u n = o(l), it suffices that w n = o(t~ 2 ) and z n = ofbjf 1 ). The u„-covering number 
N n of K n relative to the Euclidean distance is 


{ log {K/b n ) x 

\ log(l ~\~ Zn) 


In 1. 


log(Win) 

log(l + Wn) 


N n = 0 
















with v n , w n , z n that need to be chosen so that N n = o(e rae ") as postulated by requirement |A1|. 

Fix 7 ' = (b r , Ik, t 2 s ) G B r x Lk x T s and consider any 7 = (b, l , t 2 ) G B r x Lk x T s . If 
a' ~ IG(a, 6 r ) then 7 T r cr' ~ IG(a, 6 ). For z' = (pj, £'•), if -F' = EjiiPA} ~ DP(c 0 zz x G t J 
then F = Epli i+peC') ~ DP(co^ x Gt(- — Z)), where l denotes a d y -dimensional vector with 

components all equal to l. Throughout, we use the same symbol l to denote either the scalar or the 
vector, the correct interpretation being clear from the context. 

Let 9 = (F, a). For every x € X, let fe(-\x) = Y^jLiPj,<T( x )<f>tr(' — Uj)- The transformation 
il>y rt (9) gives rise to the following density 


OO 

fil> rf i^{6){'\ x ) = ^ ' Pj,Tr r a’ (x)(f>n r a / (• — l — PsCj)- 
3=1 


We now identify a set B n such that 


inf II 7 (B n ) > e 

iSiKn 


-Cnei 


( 2 . 6 ) 


for some constant G > 0. Preliminarily, note that, by Lemma 7.1 of Norets and Pat i (20141) . in 
virtue of assumption (ii), the squared q- integrated Hellinger distance between fg and /o can be thus 
bounded above: 

h 2 (fe, fo) < 4||g|U \\(M 1/2 ~ (/ol^) 1/2 Hl, 

where HgHoo := sup^g^ q(x) and the Lebesgue density g is such that fg(y\x)g(x) = E^li Pj^p-j^ix, v)> 
that is, g(x) = E^li Pqftp* ,v( x )- This allows us to use cZ-dimensional Gaussian mixtures Y^jLi Pj a( x ’ V) 
to approximate th e density fn(y\x) l x(x ) defined on Z. The set B n is the same as the one described 
in Theorem 3.1 of lNorets and Patil (2014). Let a n = (e„| loge r i|~ 1 ) 1// ' S and a an = ao| logUnl 1 / 1 ", with 
clq = [(8/3 + 4 r/ + 16)/(1?o^)] 1 ' /t for a sufficiently s mall 5 > 0. Fi nd Zq > maxjl, 1 /(2/3)} so that 
e bl 1 log e„| 5 / 4 < e n . As in the proof o f Theo rem 3.1 in lNorets and Patil ( 2014 ). which is an adaptation 


of that of Theorem 4 in Shenetal 


(12013 1. the following facts hold. First, there exists a partition 
U\, ..., Uk of {z G Z : ||z|| < a CTn } such that, for j = 1, ..., N, with 1 < N < K, the ball Uj is 
centered at Zj = (xj , yj) and has diameter cr„e„ while, for j = N+ 1, ..., K, each set Uj has diam¬ 
eter bounded above by a n . This can be realized with 1 < N < K = 0(<r“ d | loge n | d ( 1+1 / T )). Further 
extend this to a partition Ui, ..., Um of for M = 0(en d ^\ loge„| ds ), with s = 1 + 1//3 + 1/r, 
such that 1 > infp ,t)£K n (c 0 v x G t {■ - l)){Uj) > (cr n el bl ) d for all j = 1 , ..., M, provided that 
In = 0(a c r„), t n = 0(a? ) and a an = 0(t n \ logSecond, by virtue of assumptions (Hi) 
and (iv), there exists 8* = (F*, a n ), where F* = P*j&n*i with p* = Zj for j = 1, ..., N , so 

that fo*(y\x)g(x) = EE Pj<l>rt,<r n ( x > v) and ll(/e* 9) 1/2 - (/q1a-) 1/2 ||a = 0(a%). Third, P 0 (||^|| > 


,)= 0 (. 


r 4/3+2r ; +8 


)• 


Let _A4(R d ) denote the class of all probability measures on 
1, ..., M. Let B n = V n x S n be the set with 


Define p* = 0 for j = N + 


V n = {F G A4(K d ) : V \F(U 3 ) - p*\ < 2e, 


M 

El 

3=1 




min F(Uj) > 
3 = 1 ,-,M 


Adb1 


/2 
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and S n = [a n ( 1 + a^f) 1 / 2 , a n }. Note that Me“ ! < e 2 ^ 1 loge„| ds < 1 and 


inf min (c 0 v x G t (- - l ))(C / j ) 1/2 > loge„|)- d > 

(Z,t) 6 RxR+ l<j<M K J/ 17 


.2dfei 


For every 0 = (F, a) € i?„,, the q-integ r ated H ellinger distance h(fg , /o) = 0(a%). Proceed¬ 
ing as in Theorem 3.1 of lNorets and Pati ((2014), we obtain that max{KL(/o; fg), V 2 (/o; fe)} = 
Q(ne 2 ). We now evaluate the p robability of the set B n = V n x S„. By applying Lemma 10 of 


Ghosal and van der Vaart (120071 ). 


inf DP Co vxG t (—l)(‘Pn) > exp(-M|loge n |) > exp {-c 1 e n d/p \ loge^l^ 1 ). 

(Z, £) £Kn 

Also, for the probability of the set S n under the IG(a, 6 ), which is denoted by P b (S n ), we have 


inf P b (S n ) = inf 

b£K„ b£K n l-i 


n —b<T—a—l 


r(a) 


da 


> 6 “ exp (-V2b n /a n )a n a [(l + cr 2/3 ) a/2 - 1 ] > exp (-c 2 b„/a n ) 


for a suitable constant C 2 > 0 , provided that b n = 0 (log a n), with a > 0 , and b n = 0(cr n 1 ). 
Consequently, 


inf BP Co ^ xGt{ - l )(V n ) x P b (S n ) > exp (—c 3 e ra d/ ^| log e n | (d5+1)VQ ) > exp (-c 3 ne 2 ), 

provided that, for e n = n _/ 3 ^ 2 ^ +d ^(logn) 4 , the exponent t > [(ds + 1) V a]/(2 + 1//3). To complete 
verification of condition [Al], we show that, for some constant C 4 > 0, 

sup sup P 0 ”( inf e n (ipy tl (6)) <-c 4 ne 2 n ) = o^N- 1 ). 

7 'eif„ees n '■7 : ll7-7'll<“» ' 


Fix 7 ' = (b r , Ik, t 2 ) £ B r x LkxT s and consider any 7 = (&, Z, t 2 ) £ B r x Lk x T s . For every 0 £ f? n , 

M 

„ in f /i/y T (e)(2/k) > „ in f - l ~ PsQ 

7= 7-7' < u n T 7= 7-7' <«n f J 

J = 1 


>T n (y)(l + « n )- 2 e - 12d -'-/ 0 


M 


X ^]f|K' || <a an Pj,a-'(.x)c/) a i (y Ik Cj)> 

3 =1 


where 


T n (y) := exp ( - [w 2 a 2 n + d y v 2 n + (w n o Jn + v n )d y 1/2 (a arl + \\y - Z fc ||)]). 

Over the set y$ = {(y 1 , ..., y n ) £ (R d »)" : E"=i E'/=i(2/i.7 - E oK']) 2 < d y n,T 2 }, where r n = 
0 (log K n) for k > 0 , 

T n (y) > exp ( - —(l + dJ / 2 )m n [a CT „ + 4max{dy 2 Z~„/2, r n }]), 
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with m n := ma x{w n a (Tn , d}J 2 v n }. Set c n (x] a') := 1 ||C'|| <a an Pj,<T'( x )i we have c n (x ; a') > 


M 


E m 1 

j=i l||C5II<o ffn Pj — e 


*(l-2er i ) > 


;. Let F' be the distribution ob¬ 


tained by re-normalizing IIC r II h.+Q')' F° r = (-^7 (J, )> 011 fh e even t 3^o > f° r 

suitable constant C" > 0 , 

inf in{U y , n (9)) 

7Ml7-7 / ll<w n T ’ 7 

> X! log T 7 ^PT _ 2 nlo g(f + ^n) + ^ log Cn^i; cr') 
i=l /0l2/iFiJ i=1 

- -^[(1 + d 1/2 )m n (a (7n +4max{dy / 2 r n /2, r„}) + 3d x z„] 




fe'(yi\xj) 

fo(yi\xi) 


- C'nel 


i=1 

provided that z n = O(cr^e^) and m n = 0(o%e%(max.{a tTn , l n , r n }) _1 ). Also, we have 1 — P${y q ) = 
0((nr^) _1 ) and need that (nr^) _1 = o(7V“ 1 ). 

We show that the requirements of condition [A2] are satisfied. We start by describing a set T n 
of conditional densities such that, for some constant £ > 0 , 


log N((e n , T n , h) = 0(nel). 


(2.7) 


We consider the same sieve {T n } as in Theorem 4.1 of lNorets and Patil (120141) . For H n = [ne ^ l /(log n )J 


p = e 

—n 

let 


—nH~ 


1//3 =. 


) <Zn — ^ n , &n — 6 


Jn f ; 


for some constant T > 0, and //„ = (logn) Tl for some ri > 0, 


:= < ( ^2PjAx)(t>a{- - Tj) ) : ft' ^ £„> e [• 

l V 7==1 / xgA' 


Mn) Mn] j — 1 ) • • • ) 


W < H n , <T G [ff n , (J n 


For every fixed 7 ' G K n , let T n ( 7 ') := (J 7: || 7 - 7 '||<«„ ^-yVy^)* wliere Vy^G^n) denotes the preim¬ 
age of the set T n under ^y )7 . W e show that condi t ion (a ) is satisfied. Fix any 7 ' = (b r , Ik, i 2 ) € K n . 
Proceeding as in Theorem 4.1 of lNorets and Patil (120144) . 


sup sup sup ||/e(-|a:) - (e)(-|a:)||i 

7 : N'T — I'W^Un OtzFnil') X^X 


< 


^k)g[i i - i ‘i + "'n-"|] + g|i 


^ Vn (f 4” Zn)Znbn / 

£ — +- Tr - ~ e n 

V-n zlK 

as long as v n = 0(a n e n ) and z n = 0(alb n e n /b n ). 

Regarding condition ( 61 ), it follows from (12.61) that sup 7gKr i n 7 (J-„( 7 ))/n 7 (B n ) < e Kne »/ 2 for 
a suitable constant AT > 0 arising from condition (& 3 ). 
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To check condition (t> 2 ), for every 7 ' = (b r , Ik, t 2 ) G K n and any 9 G we find an upper 

bound on sup 7 . || 7 _ 7 /|| <Un , (g)(-|a;) by a function (not necessarily a density) /(• |x). For some 

constant co > 0, let a n = Co(logn) 1 /'’’. For ||y|| < a n /2, if ||Cj|| > a n and dl^ 2 l n < a n /4 then 
II V ~h~ C'll > IICjll/4- Setting r 2 := [1 - 16dl /2 (v n V w„)] _1 , for every w < H n , 

Uy n ( 6 ){y\x)'i-\\y\\<a n / 2 (y) 

UJ 

< ^2 Pj,<r (%)</><?(y -h- Cj) 

3 =1 

x exp (~2 max (^, Wn}(dl /2 + IICjIDIb -h- Cjll^ 1 !|y!|<a n / 2 (2/) 

< max{e( 3 / 2+ < /! ) 2 ^ v ”»)( 0 ” v W“»/ ff2 ) r n } 

UJ 

X ^ ^ Pj^( X )[^-\\CA\<an < f > lk+C,^(y) “b l||C)ll> a n^ft+Cj. r 'ncr(2/)]l||y||<a„/2(2/) 

3 =1 

< ma x{e (3/2+d i /2)2(,, " v,i, " )(Q " vr " )a ’* /(7r '' ,T ' )2 , r n } 

xe 6d ^/^l M < an/2 (y) 

UJ 

X ^ 1 Pj,cr' ( a ')[^llCl W^Zanfilk+Ci^rCr' (y) "b ^ IlCj II > a n filk+Cj ,r n 7r r <7' ( 2 /)] 

3 =1 

=: /(z/|z), 

where in the third inequality we have used the fact that Pj, a (x) < e 6dxZn ^' 7 ) 2 pj ^(x). Note that 
7 i r a' G [a n , d n \ and Ik + Cj 6 [— p n , Jln} dy for j = 1, ..., u>, with w < H n . Set the positions 

c' max |g(3/2+dJ, /2 ) 2 (i;nVii)„)(a„vr„)a„/(7r T .cr') 2 j r \ x g 6 d x z n /(a') 2 


and 


"0) : = 

3 =1 


L IIC'!I< 


.,J 


M <“»/2 

+ 1 IIC'II> 


■Zfc+c 'i,*r Ay) d y 


J\\v 


fak+ti.rn*, <T'(y) d y 


'IMI<'W 2 

and observed that c(ir) < 1 for all x G X, under the constraints = 0((ne n )~ 2 ) and v n V = 
0(((a n V Z n )a„ne 2 ) _1 ), the normalizing constant of IX"=i /(j/il®*) can be thus bounded above 


2=1 ■ 

Hie' x c(®<)] < (max{e (3 / 2+d i /2)2(, '" v, “" )(a " v ^ )a ”^ 2 , r n } x 


0 6d x 2 n (l+z„) 2 / 


*)' 


< 


exp (C 3 {v n V w n )(a n V l n )a n (ne 2 ) 2 + 48d x nz n (ne ^) 2 ) < e C3r 


for suitable constants C 3 , C 3 > 0. Let 34 = {y G y : ||y|| < a n /2}. We are allowed to consider the 
restriction to (X x 34)" since, by virtue of assumption ( iv ), 

P 0 n ((X x 34T) = ( C [ fo(y\x)q(x) dxdy) < e~ B ^ a ^ < e~ B ° n ^. 

\Jo J\\y\\>a n /2 ) 
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Recalling that, in the present setting, dQg : y/dm = sup 7 . || 7 _ 7 /|| <11n , (g)(-|x)g(a;), in order to 

show that condition ( 62 ) is satisfied, we need to prove that 


sup 



Q2,y(Z n ) 


iy(d 6 p 

n 


By inequality (12.61) . it suffices to show that 


o(N ., 


-1 e -C 2 ne„ 


)• 


sup [ Q n ei ,(Z n )n i ,{d6) = 0{e~ En ^) (2.8) 

7'e^r. Y) 

for some constant E > ( C 2 V C3), where C3 plays the role of C in (12.61) . The integral in (12.811 can be 
thus split up: 


sup [ Q^(2 n ) n 7 ,(d 0 ) 

7 ’FK n 7 ') 


= sup 
7 ’£K n 


lFGM( R d ) 




1 2 


Ql Y ((x x yi)")ny(dfl) 


+ [ r Qll'W X yi) n )Hy'(dfl) 

■If eF-(y) J<Ln/2 


—: Si + S 2 + S 3 . 


To deal with the term Si, we partition (0, a n ) = UyLo[—. n 2 _ t J ’ +1 ), a n 2~i). For every j £ No, let 
u n ,j = e„(a; n 2 —J ), with e„ = o(l) so that u n j < u n . For every 7' = (b r , Ik, t 2 s ) £ K n , consider a un¬ 
covering of {7 : ||7 — 7'H < u n } with centering points 7j, for i = 1, ..., IVj, with iVj < (u n /u n j) 3 . 
For a suitable constant A > 0, 

sup [ f Q^,((Axy 1 ) n ) n y (d0) 

YGKn JfgM( R d ) •)v'<g_ n 

( OO 

^ exp (nun tj [( 3/2 + dy /2 ) 2 (a n V l n )a n + 6<4]/(£„2 _(j+1) ) 2 + nu n ^ 
j =0 

x p b i (fen2“ a+1) , 2L„2- J ) 

1<2< A'j 

✓ OO 

= o( y^exp ^2ne„[(3/2 + d l J‘ 2 ) 2 {a n V r„)a n + 6d x \/(a n 2~ ( ' J+1 ' 1 ) + ne„o; n 2 _ ^ 

^ j=o 

Nj 

x exp (-(6„/aj2^)2(“- 1 )^(& i /a„)“- 1 

2—1 

= O (un(b n /a, J Q_1 exp (ne n g_ n +u n - log(e„a„)) 

OO v 

^ ' e -(2 3 {[6 rl -2ne n [(3/2+(iy 2 ) 2 (a„vr„)a n +6d x ]/CT Il -l}+i(l-a) log2) j 

3=0 ' 

= 0(e ~ Ane ») 

provided that e„ = o((ncr n ) _1 ), b n > (logn)~ v for some v > 0 and e n = 0(n~ 1 (a n V l n )~ 1 a~ 1 ). 
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Concerning S 2 , for a suitable constant B > 0 


S 2 < (max{e 4 ( 3 / 2+ ' 1 » /2 ) 2 (”“ v ““)(“” v '”)“” e " 2T ” t V n }) r 


x e 2AdxnZne 2Tnen (l + z n ) n sup P b ((a n /2, 00 )) < e 

beK n 


-Bnel 


because 


/»4er„ 


sup P b ((a n /2 , cx))) = sup / 
beK n beK n Jo 


» „—b r &~cx.— 1 1 _ 

—-e a dcr 

r(a) 

—46„ 


< (4& n d’^ 2 ) a_1 (l — e~ 46 ” 5 “ ) 

= (46„a- 2 )“- x ^ ^-(46 n a-- 2 ) fc 


fc=l 


< b n e 2Trae « exp (- 2 aTne 2 + a log 6 „) 

provided that z n = 0(e 2 ) and (v n V «;„) = 0(n~ 1 (a„ V r„) _ 1 a“ 1 e 2 ). 
Concerning S 3 , for any e € (0, 1) and a suitable constant D > 0, 


/ f 

JFGFSi'y') Jcr„ / 


Qlydxxyo”) n 7 ,(d0) 

' F£F%( 7 ') J(J_ n / 2 

< (' max |g4(3/2+dy 2 ) 2 (i; rl Vtu„)(a„vt„)Q„/CT^ r„})’ 1 e 24<i>,nZ "/ 2; ”+ n2: n 

X (1 + exp (-ne-K / 2 K v “n) c/z 2 /[2(1 + ^ 2 - 2]) < 


provided that = C^n^cr^e 2 ) and (u ra V iu n ) = 0(n~ 1 (a n V r n ) _ 1 a“ 1 < 7 2 e 2 ), with a n < 2 d^ 2 fl n . 

We now check that condition ( 63 ) is satisfied. We show that there exists a constant K > 0 
such that, for any fixed 7 ' = (b r , Ik, t 2 s ) £ K n , for every e > 0 and all 6 £ Bn( 7 ') such that the 
g-integrated Hellinger distance h(fg , /o) > e, there exists a test <f> n (fg) satisfying 


PoMfo) < e 


-Kne 2 


and Qe 7 /[l - </>n(/e)] < e 


-Kne 2 


(2.9) 


By Corollary 1 of Ghosal and van der Vaart (2003), for every 9 £ B n {l') such that h(fg 1 fo) > Me„, 
there exists a test <p n , which is the maximum of all tests attached to probability measures that are 
the centers of balls covering {9 £ JF n { 7 ') : h(fg, fo) > Me n }, such that 

P^cf n <N(Me n /4, h)e ~ n ^^ 2 and sup P?(l -</>„)< e ~^ M ^ 2 . 

6 eFn(Y) 


By inequality (12. 71) . the requirement on the I type error probability in (|2.9p is satisfied. The second 
requirement is satisfied provided that, for some constant M" > 0, we have h(f ^ , (g), fo) > M"e n 
for all 7 such that ||7 - 7 '!! < «n- Since f 0 ) > 2“ 1 (|| fg - / 0 ||i - \\}g - f^,^g)\\i), 

it is enough that sup^g^ ||/e(-|x) — f ^ , 7 (fl)(‘l a ')lli < M'e n for some constant M' < M so that 
M" = M — M'. This can be seen to hold as for condition (a). Inequality (12.811 then follows by 
combining upper bounds on Si, S 2 and S 3 . 

The proof is completed noting that the assertion follows by choosing sequences v n , w n and z n 
so that all the constraints arisen in the proof are simultaneously satisfied. □ 
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Remark 2.1. Theorem \2.1\ takes into account only a data-driven choice of the scale parameter of an 
inverse-gamma prior on the bandwidth, but an empirical Bayes selection of the shape parameter could 
be considered as well. In order to identify the mapping for the change of prior measure, it suffices 
to note that, for aeN, if a r ~ Gamma(l, 1), r = 1, ..., a, then fdfipx + ... + cr a ) ~ IG(a, (3). 


2.2 Empirical Bayes dimension reduction in the presence of irrelevant 
covariates 


We now deal with the case where a d^-dimensional explanatory variable is considered, but not all 
the covariates are relevant to the response whose conditional distribution may depend only on fewer 
of them, say 0 < d x < d x , which, without loss of generality, can be thought of as the first d x of the 
whole collection employed in the model specified in (12.31) . Besides rate adaptation, another appealing 
feature of the empirical Bayes procedure herein considered is automatic dimension reduction in the 
presence of i rrelevant covaria t es, on par with the posterior distribution corresponding to the prior 
proposed bv INorets and Patil ([2014 1 ). The posterior automatically selects the model with the subset 
of relevant covariates among all competing models. 


Theorem 2.2. Suppose that the true conditional density fo depends on the first d x € No covariates 
and satisfies assumptions (iii)-{iv) of Section \2.1.1\ Under the same conditions as in Theorem IQ 
the empirical Bayes posterior distribution corresponding to the prior in (El contracts at a rate 
e n = n~P/( 2 P +d °' > (log nf , with d° := d x + d y and t > 0 a suitable constant. 

The proof follows the same trail as that of Theorem l2.ll the only difference arising from the prior 
concentration rate which turns out to depend on the dimension d x of the relevant covariates of fo 
because, for all the locations of the approximating Gaussian mixture, when k > d x , the components 
pfj k = 0 so that eventually the mixture does not depend on the covariates Xk for k = d x +1 , ..., d x . 

As a simple consequence of Theorem 12.21 we have that, if d x =0, then fo(y\x) = fo(y) and the 
response is stochastically independent of the predictor. 


3 Final Remarks 

In this note, we have proposed an empirical Bayes procedure for conditional density estimation 
based on infinite mixtures of Gaussian kernels with predictor-dependent mixing weights and have 
shown that a data-driven selection of the prior hyper-parameters can lead to inferential answers 
that are similar, for large sample sizes, to those of hierarchical posteriors in automatically adapting 
to the dimension of the set of relevant covariates and to the regularity level of the true sampling 
conditional density. An empirical Bayes selection of the prior hyper-parameters leads to pseudo¬ 
posterior distributions with the same performance as fully Bayes posteriors, provided the estimator 
/?„ of the scale parameter of an inverse-gamma prior on the bandwidth takes values in a set [ b n , b n ) 
such that Po{j3 n € \b nl b n ) c ) = o(l), a requirement that imposes restrictions on the sequences b n 
and 6 „, in particular, on the decay rate at zero of b n , which is expectedly more important than that 
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at which b n "f oo. If the prior hyper-parameter has an impact on posterior contraction rates, then 
the choice of the plug-in estimator is crucial and requires special care. This may, for example, rule 
out the maximum marginal likelihood estimator for /3. When the hyper-parameter does not affect 
posterior contraction rates, as it is the case for the mean A and variance r 2 of the Dirichlet base 
measure, there is more flexibility in the choice of the estimator: different choices are indistinguishable 
in terms of the posterior behavior they induce and empirical Bayes posterior contraction rates are 
the same as those of any posterior corresponding to a prior with fixed hyper-parameters. 

The result of Theorem 12.21 deals with isotropic Holder densities but an extension to anisotropic 
densities, that have different levels of regularity along different directions, is envisaged. In the 
anisotropic case, the presented results provide adaptive rates corres ponding to t he lea st smooth 
direction. Sharper rates can be obtained along the lines of Section 5 in 


Shenetal 


( 20131 ) combined 


with the preceding treatment using component-specific bandwidths. Details are omitted. 


APPENDIX 


In this section, an adapted version of Theorem 1 in Donnet et al. ( 2014 ) is reported for easy 
reference. Some additional notation is preliminarily introduced. 

Let (*(”), B n , {P^ n) : 9 £ 0)) be a sequence of statistical experiments, where X^ and 0 
are Polish spaces endowed with their Borel er-fields B n and £>(0), respectively. Let d(-, •) denote 
a (semi-)metric on 0. Let X^ £ ;f( ra ) be the observation at the nth stage from Pg™\ where 9q 
denotes the true parameter. Let fi( n ' ) be a cr-finite measure on {X^ n \ B n ) dominating all probability 
measures Pjj n \ for 9 £ 0. For every 9 £ 0, let i n {6) denote the log-likelihood ratio log (pg 1 ' > /p^). 

We consider a family of prior distributions {n 7 } on (0, £>(0)), with T C R fc , k £ N. Let 
n 7 (-|stand for the posterior distribution corresponding to n 7 . For any measurable function 
7 n : X^ —> r, the empirical Bayes posterior law n 7 rl (-|A( n )) is obtained by plugging 7 „ into the 
posterior distribution, 

n 7»(-A (n) ) = n 7 (-|x^)| 7= 7„. 

The statement of the theorem follows. 


Theorem 3.1 (Donnet et al 


(2014)). Let 9 q £ 0. For every 7 , 7 ' £ T, let if-y,Y : 0 —> 0 be a 


measurable mapping such that, if 9 ~ n 7 , then ifj^>(9) ~ ny. Assume that 

[Al] there exist sets K n C T with Pg™\*jf n £ K°) = o( 1), positive sequences u n , e n 4- 0, with 
ne 2 —> 00 , for which N n := N{u n , K n , || • ||) = o(e n£n ) and sets B n £ H(0) such that, for some 
constant C 1 > 0, 

sup supf 9 (n) ( inf 4(^7,7'( 0 )) < -Cmel) = o(N ~ 1 ); 

7 eK n 0eB n ^r: \n—^\\<u n / 


[A2] for every 7 £ K n , there exists a set 0„( 7 ) £ H(0) such that 
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(a) sup 7 ,. || 7 /- 7 ||<« n sup 0e0ii ( 7 ) d(9, ^ 7 , 7 '{6)) < M'e n for some constant M' > 0, 

(&) for constants (, K > 0 and > C\, 

( 6 i) log N((e n , 0„(j), d ) < Knel/2 


and 


sup 7eifri n X (01 l ( T )) < 


n 7 (s„) 


(6 2 ) defined such that dQ^/dp^ := supy. || 7 '- 7 ||< Utl P^" ) y( e) > 

sup f o{ AT-V^), 

76K„ j0\e„(-y) 

(63) for any e > 0 0„( 7) with d(6 , ( 9 q) > e, i/tere exists a test (f> n (6) with 


P^MO) < e 


—Kne 


and 


Q { e n ![i - M0)\ < e 


— Kne 


Then, for a sufficiently large constant M > 0, 

p£% n (d(6, 9 0 ) > Me n \x^) -»■ 0 . 
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